Goto

Collaborating Authors

 fact book


Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts

Gupta, Raavi, Panicker, Pranav Hari, Bhatia, Sumit, Ramakrishnan, Ganesh

arXiv.org Artificial Intelligence

Large language models (LLMs), despite their remarkable text generation capabilities, often hallucinate and generate text that is factually incorrect and not grounded in real-world knowledge. This poses serious risks in domains like healthcare, finance, and customer support. A typical way to use LLMs is via the APIs provided by LLM vendors where there is no access to model weights or options to fine-tune the model. Existing methods to detect hallucinations in such settings where the model access is restricted or constrained by resources typically require making multiple LLM API calls, increasing latency and API cost. We introduce CONFACTCHECK, an efficient hallucination detection approach that does not leverage any external knowledge base and works on the simple intuition that responses to factual probes within the generated text should be consistent within a single LLM and across different LLMs. Rigorous empirical evaluation on multiple datasets that cover both the generation of factual texts and the open generation shows that CONFACTCHECK can detect hallucinated facts efficiently using fewer resources and achieves higher accuracy scores compared to existing baselines that operate under similar conditions. Our code is available here.


Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study

Jain, Amay, Cui, Liu, Chen, Si

arXiv.org Artificial Intelligence

Large language models like ChatGPT are increasingly used in classrooms, but they often provide outdated or fabricated information that can mislead students. Retrieval Augmented Generation (RAG) improves reliability of LLMs by grounding responses in external resources. We investigate two accessible RAG paradigms, vector-based retrieval and graph-based retrieval to identify best practices for classroom question answering (QA). Existing comparative studies fail to account for pedagogical factors such as educational disciplines, question types, and practical deployment costs. Using a novel dataset, EduScopeQA, of 3,176 questions across academic subjects, we measure performance on various educational query types, from specific facts to broad thematic discussions. We also evaluate system alignment with a dataset of systematically altered textbooks that contradict the LLM's latent knowledge. We find that OpenAI Vector Search RAG (representing vector-based RAG) performs well as a low-cost generalist, especially for quick fact retrieval. On the other hand, GraphRAG Global excels at providing pedagogically rich answers to thematic queries, and GraphRAG Local achieves the highest accuracy with the dense, altered textbooks when corpus integrity is critical. Accounting for the 10-20x higher resource usage of GraphRAG (representing graph-based RAG), we show that a dynamic branching framework that routes queries to the optimal retrieval method boosts fidelity and efficiency. These insights provide actionable guidelines for educators and system designers to integrate RAG-augmented LLMs into learning environments effectively.


Artificial Intelligence Fact Sheet - Content Science Review

#artificialintelligence

Content Science is a content strategy and intelligence firm based in Atlanta, GA. Founded in 2010 by Colleen Jones, author of Clout: The Art Science of Influential Web Content, our mission is to transform industries, organizations, and individuals for the better by putting content first. We offer professional services, publications, and software for clients ranging from Fortune 50 companies to nonprofits to government agencies.


Question Answering from Frequently Asked Question Files: Experiences with the FAQ FINDER System

Burke, Robin D., Hammond, Kristian J., Kulyukin, Vladimir, Lytinen, Steven L., Tomuro, Noriko, Schoenberg, Scott

AI Magazine

This article describes FAQ FINDER, a natural language question-answering system that uses files of frequently asked questions as its knowledge base. Unlike AI question-answering systems that focus on the generation of new answers, FAQ FINDER retrieves existing ones found in frequently asked question files. Unlike information-retrieval approaches that rely on a purely lexical metric of similarity between query and document, FAQ FINDER uses a semantic knowledge base (WORDNET) to improve its ability to match question and answer. We include results from an evaluation of the system's performance and show that a combination of semantic and statistical techniques works better than any single approach.